In this notebook we're going to build a dashboard that we can then share with others.
What we're going to do is build our dashboard in iPython Notebook and then use the nbconvert tool from Jupyter (the new name for iPython Notebook) to convert our notebook into another format, specifically HTML. We can then share that HTML page with others.
Step 1: Ensure you have the latest version of Jupyter
conda install jupyter
Step 2: Install the nbconvert utility
pip install nbconvert
Step 3: Build the dashboard - that will be all the code that follows
Step 4: Use the nbconvert utility to convert the notebook to an HTML file.
jupyter nbconvert --execute '2 - Build a Shareable Dashboard Using iPython Notebook and Matplotlib.ipynb'
The "execute" argument will run the notebook before exporting it. This allows us to automate the creation of the dashboard, and ensure that we're using the latest data in either the database or a data file.
If you'd like more information check out the following links:
* nbconvert: http://nbconvert.readthedocs.org/en/latest/index.html
* Jupyter: http://jupyter.readthedocs.org/en/latest/index.html
For this dashboard we will generate a number of charts from chapter 4, specifically the distribution analysis, categorical variable analysis and time-series analysis. While our dataset is static, you will want to point the dashboard to a file that is updated, or use a database query that pulls data for a specific time period such as the last seven days.
In [ ]:
# Import the Python libraries we need
import pandas as pd
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
%matplotlib inline
In [ ]:
# Define a variable for the accidents data file
accidents_data_file = '/Users/robert.dempsey/Dropbox/Private/Art of Skill Hacking/Books/' \
'Python Business Intelligence Cookbook/Data/Stats19-Data1979-2004/Accidents7904.csv'
accidents = pd.read_csv(accidents_data_file,
sep=',',
header=0,
index_col=False,
parse_dates=True,
tupleize_cols=False,
error_bad_lines=False,
warn_bad_lines=True,
skip_blank_lines=True,
low_memory=False
)
In [ ]:
fig = plt.figure()
ax = fig.add_subplot(111)
ax.hist(accidents['Weather_Conditions'],
range = (accidents['Weather_Conditions'].min(),accidents['Weather_Conditions'].max()))
counts, bins, patches = ax.hist(accidents['Weather_Conditions'], facecolor='green', edgecolor='gray')
ax.set_xticks(bins)
plt.title('Weather Conditions Distribution')
plt.xlabel('Weather Condition')
plt.ylabel('Count of Weather Condition')
plt.show()
In [ ]:
accidents.boxplot(column='Light_Conditions',
return_type='dict');
In [ ]:
accidents.boxplot(column='Light_Conditions',
by = 'Weather_Conditions',
return_type='dict');
In [ ]:
casualty_count = accidents.groupby('Day_of_Week').Number_of_Casualties.count()
casualty_probability = accidents.groupby('Day_of_Week').Number_of_Casualties.sum()/accidents.groupby('Day_of_Week').Number_of_Casualties.count()
fig = plt.figure(figsize=(8,4))
ax1 = fig.add_subplot(121)
ax1.set_xlabel('Day of Week')
ax1.set_ylabel('Casualty Count')
ax1.set_title("Casualties by Day of Week")
casualty_count.plot(kind='bar')
ax2 = fig.add_subplot(122)
casualty_probability.plot(kind = 'bar')
ax2.set_xlabel('Day of Week')
ax2.set_ylabel('Probability of Casualties')
ax2.set_title("Probability of Casualties by Day of Week")
In [ ]:
# Create a dataframe containing the total number of casualties by date
casualty_count = accidents.groupby('Date').agg({'Number_of_Casualties': np.sum})
# Convert the index to a DateTimeIndex
casualty_count.index = pd.to_datetime(casualty_count.index)
# Sort the index so the plot looks correct
casualty_count.sort_index(inplace=True,
ascending=True)
# Plot the data
casualty_count.plot(figsize=(18, 4))
In [ ]:
# Plot one year of the data
casualty_count['2000'].plot(figsize=(18, 4))
In [ ]:
# Plot the yearly total casualty count for each year in the 1980's
the1980s = casualty_count['1980-01-01':'1989-12-31'].groupby(casualty_count['1980-01-01':'1989-12-31'].index.year).sum()
the1980s
# Show the plot
the1980s.plot(kind='bar',
figsize=(18, 4))
In [ ]:
# Plot the 80's data as a line graph to better see the differences in years
the1980s.plot(figsize=(18, 4))
To share this dashboard as an HTML file, run the following command in the directory where the notebook is located:
jupyter nbconvert --to html --execute '2 - Build a Shareable Dashboard Using iPython Notebook and Matplotlib.ipynb'
In [ ]: